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The DO Collaboration presents first evidence for the production of single top quarks at the 
Fermilab Tevatron pp collider. Using a 0.9 fb~ dataset, we apply a multivariate analysis to separate 
signal from background and measure a(pp — > tb + X, tqb + X) = 4.9 ± 1.4 pb. The probability to 
measure a cross section at this value or higher in the absence of signal is 0.035%, corresponding to 
a 3.4 standard deviation significance. We use the cross section measurement to directly determine 
the CKM matrix element that describes the Wtb coupling and find 0.68 < \V t t\ < 1 at 95% C.L. 
within the standard model. 
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Top quarks were first observed in strong tt pair produc- 
tion at the Tevatron collider in 1995 [1]. In the stan- 
dard model (SM), a(pp tt + X) = 6.8±g;| pb at 
y/s = 1.96 TeV for a top quark mass of 175 GeV. Top 
quarks are also expected to be produced singly via the 
electroweak processes [H, 0] illustrated in Fig. [T] For 
brevity, we use the notation u tb" to represent the sum 
of tb and ib, and "tqb" for the sum of tqb and tqb. The 
next-to-leading order (NLO) prediction for the s-channel 
single top quark cross section is cr{pp — ► tb + X) — 
0.88 ± 0.11 pb, and for the t-channel process, the predic- 
tion is a(pp -> tqb + X) = 1.98 ± 0.25 pb [1, @. 



(a) 



W + 




(b) 




FIG. 1: Representative Feynman diagrams for (a) s-channel 
single top quark production and (b) t-channel production. 

Single top quark events can be used to study 
the Wtb coupling @, and to measure the magni- 
tude of the element \Vtb\ of the quark mixing matrix, 
(the Cabibbo-Kobayashi-Maskawa (CKM) matrix [8fl), 
without assuming only three generations of quarks [9j. 
The quark mixing matrix must be unitary, which for 
three families implies \Vtb\ — 1 [13] • A smaller measured 
value would indicate the presence of a fourth quark family 
to make up the difference. Single top quark production 
can also be used to measure the top quark partial decay 
width T(t— >Wb) [ll[ and hence the top quark lifetime. 

The DO collaboration has previously published 
limits [l2T ] on single top quark production. The best 
95% C.L. upper limits are a(pp — > tb + X) < 6.4 pb and 
a(pp — ► tqb + X) < 5.0 pb. The CDF collaboration has 
also published limits on the cross sections [HI]. 

This Letter describes a search for single top quark 
production using 0.9 fb _1 of data produced at a center-of- 
mass energy of 1.96 TeV. The data were collected from 
2002 to 2005 using the DO detector [3] with triggers 
that required a jet and an electron or a muon. The 
search focuses on the final state consisting of one high 
transverse momentum (pt) isolated lepton and missing 
transverse energy (|?t), together with a 6-quark jet from 
the decay of the top quark (t^Wb^ivb). There is an 
additional b quark in s-channel production, and an addi- 
tional light quark and b quark in t-channel production. 
The second b quark in the t-channel is rarely recon- 
structed since it is produced in the forward direction 
with low transverse momentum. The main backgrounds 
are: W bosons produced in association with jets; top 
quark pairs decaying into the lepton+jets and dilepton 
final states, when a jet or a lepton is not reconstructed; 
and multijet production, where a jet is misreconstructed 
as an electron, or a heavy-flavor quark decays to a muon 
that passes the isolation criteria. 



We model the signal using the SINGLETOP NLO Monte 
Carlo (MC) event generator [is] ]. The event kinematics 
for both s-channel and t-channel reproduce distributions 
found in NLO calculations ||. The decays of the top 
quark and resulting W boson, with finite widths, are 
modeled in the SINGLETOP generator to preserve particle 
spin information. Pythia [16] is used to model the 
hadronization of the generated partons. For the tb search, 
we assume SM tqb as part of the background, and vice 
versa. For the tb+tqb search, we assume the SM ratio 
between the tb and tqb cross sections. 

We simulate the tt and W+jets backgrounds using 
the ALPGEN leading-order MC event generator 17j and 
pythia to model the hadronization. A parton-jet 
matching algorithm [l8| is used to ensure there is no 
double-counting of the final states. The tt background 
is normalized to the integrated luminosity times the 
predicted tt cross section [2j. The multijet background 
is modeled using data that contain nonisolated leptons 
but which otherwise resemble the lepton+jets dataset. 
The VF+jets background, combined with the multijet 
background, is normalized to the lepton+jets dataset 
separately for each analysis channel (defined by lepton 
flavor and jet multiplicity) before 6-jet tagging (described 
later). In the IF+jets background simulation, we scale 
the Wbb and Wcc components by a factor of 1.50+0.45 to 
better represent higher-order effects [l9j |. This factor is 
determined by scaling the numbers of events in an admix- 
ture of light- and heavy-flavor VF+jets MC events to data 
that have no b tags but which otherwise pass all selection 
cuts. The uncertainty assigned to this factor covers the 
expected dependence on kinematics and the assumption 
that the factor is the same for Wbb and Wcc. 

We pass the MC events through a GEANT-based simu- 
lation 20] of the DO detector. To correct differences 
between the simulation and data, we apply weights to 
the simulated events to model the effects of the triggers, 
lepton identification and isolation requirements, and the 
energy scale of the jets. The ^-tagging algorithm 2l| is 
modeled by applying weights that account for the proba- 
bility for each jet to be tagged as a function of jet flavor, 
Pt, and pseudorapidity r/. 

We choose events with two, three, or four jets, recon- 
structed using a cone algorithm [22j with radius 1Z = 
\J (Ay) 2 + (A(j>) 2 = 0.5 (where y is rapidity and <f> 
is azimuthal angle) to cluster energy deposits in the 
calorimeter. The leading jet has pt > 25 GeV and \q\ < 
2.5, the second leading jet haspr > 20 GeV and |?7| < 3.4, 
and subsequent jets have pr > 15 GeV and \rj\ < 3.4. 
Events are required to have 15 < $t < 200 GeV and 
exactly one isolated electron with pr > 15 GeV and 
\rj\ < 1.1 or one isolated muon with pr > 18 GeV 
and | r) | < 2.0. Misreconstructed events are rejected by 
requiring that the direction of the J£t is not aligned or 
anti-aligned in azimuth with the lepton or a jet. To 
enhance the signal content of the selection, one or two of 
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the jets are required to be identified as originating from 
long-lived b hadrons by a neural network £>-jet tagging 
algorithm. The variables used to identify such jets rely 
on the presence and characteristics of a secondary vertex 
and tracks with high impact parameters inside the jet. 
For a 0.5% light-jet 6-tag efficiency (the average mistag 
probability), we obtain a 50% average tag rate in data 
for b jets with \rj\ < 2.4. 

We select 1,398 6-tagged lepton+jets data events, 
which we expect to contain 62 ± 13 single top quark 
events. To increase the search sensitivity, we divide these 
events into twelve independent analysis channels based 
on the lepton flavor (e or n), jet multiplicity (2, 3, or 
4), and number of identified b jets (1 or 2). We do this 
because the signal acceptance and signal-to-background 
ratio differ significantly from channel to channel. Event 
yields are given in Table HI shown separated only by jet 
multiplicity for simplicity. The acceptances for single top 
quark signal as percentages of the total production cross 
sections are (3.2 ± 0.4)% for tb and (2.1 ± 0.3)% for tqb. 

The dominant contributions to the uncertainties on the 
backgrounds come from: normalization of the tt back- 
ground (18% of the tt component), which includes a term 
to account for the top quark mass uncertainty; normal- 
ization of the VF+jets and multijet backgrounds to data 
(17-27%), which includes the uncertainty on the heavy- 
flavor fraction of the model; the jet energy scale correc- 
tions (1-20%); and the 6-tagging probabilities (12-17% 
for double-tagged events). The uncertainty on the inte- 
grated luminosity is 6%; all other sources contribute at 
the few percent level. The uncertainties from the jet 
energy scale corrections and the 6-tagging probabilities 
affect both the shape and normalization of the simu- 
lated distributions. Having selected the data samples, we 
check that the background model reproduces the data in 
a multitude of variables (e.g., transverse momenta, pseu- 
dorapidities, azimuthal angles, masses) for each analysis 
channel and find agreement within uncertainties. 

Since we expect single top quark events to consti- 
tute only a small fraction of the selected event samples, 
and since the background uncertainty is larger than the 
expected signal, a counting experiment will not have 
sufficient sensitivity to verify their presence. We proceed 
instead to calculate multivariate discriminants that sepa- 
rate the signal from background and thus enhance the 
probability to observe single top quarks. We use deci- 
sion trees 23] to create these discriminants. A decision 
tree is a machine-learning technique that applies cuts 
iteratively to classify events. The discrimination power 
is further improved by averaging over many decision 
trees constructed using the adaptive boosting algorithm 
AdaBoost [24]. We refer to this average as a boosted 
decision tree. 

We identify 49 variables from an analysis of the signal 
and background Feynman diagrams [251 ] , studies of single 
top quark production at NLO (2(|, and from other anal- 



TABLE I: Numbers of expected and observed events in 
0.9 fb _1 for e and /i, 1 6 tag and 2 6 tag channels combined. 
The total background uncertainties are smaller than the 
component uncertainties added in quadrature because of anti- 
correlation between the VF+jets and multijet backgrounds 
resulting from the background normalization procedure. 



Source 


2 jets 


3 jets 


4 jets 


tb 


16±3 


8±2 


2±1 


tqb 


20±4 


12±3 


4±1 


ti^tl 


39±9 


32±7 


11±3 


ft-^f+jets 


20±5 


103±25 


143±33 


Wbb 


261±55 


120±24 


35±7 


Wcc 


151±31 


85±17 


23±5 


Wjj 


119±25 


43±9 


12±2 


Multijets 


95±19 


77±15 


29±6 


Total background 


686±41 


460±39 


253±38 


Data 


697 


455 


246 



yses [3, [13] ■ The variables may be classified into three 
categories: individual object kinematics, global event 
kinematics, and variables based on angular correlations. 
Those with the most discrimination power include the 
invariant mass of all the jets in the event, the invariant 
mass of the reconstructed W boson and the highest-py b- 
tagged jet, the angle between the highest-pr 6-tagged jet 
and the lepton in the rest frame of the reconstructed top 
quark, and the lepton charge times the pseudorapidity of 
the untagged jet. We find that reducing the number of 
variables always reduces the sensitivity of the analysis. 

We use a boosted decision tree (DT) in each of the 
twelve analysis channels for three searches: tb+tqb, tqb, 
and tb. These 36 DTs are trained to separate one of the 
signals from the sum of the tt and TT+jets backgrounds. 
One-third of the MC signal and background events is 
used for training; the remaining two-thirds are used to 
determine the acceptances in an unbiased manner. A 
boosted decision tree produces a quasi-continuous output 
distribution Out ranging from zero to one, with back- 
ground peaking closer to zero and signal peaking closer 
to one. Figures EJa) and [21(b) show the DT output distri- 
butions for two background-dominated data samples to 
demonstrate the agreement between background model 
and data. Figure |^c) shows the high end of the sum of 
the 12 tb+tqb DT outputs to illustrate where the signal 
is expected, and Fig. [2jd) shows the invariant mass of 
the reconstructed W boson with the highest-pr 6-tagged 
jet (where the neutrino longitudinal momentum has been 
chosen to be the smaller absolute value of the two possible 
solutions to the mass equation), for events in a signal- 
enhanced region with Odt > 0.65. The background 
peaks near the top quark mass because the DTs select 
events similar to single top quark events. 

We apply a Bayesian approach [28| to measure the 
single top quark production cross section. We form a 
binned likelihood as a product over all bins and channels 
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FIG. 2: Boosted decision tree output distributions for (a) a 
IF+jets-dominated control sample, (b) a tt-dominated control 
sample, and (c) the high-discriminant region of the sum of 
all 12 tb+tqb DTs. For (a) and (b), H T = E l T + # T + 
^ ^aiijcts -p\ Q ^ (,-[) shows the invariant mass of the recon- 
structed W boson and highest-pr fe-tagged jet for events with 
Out > 0.65. The hatched bands show the ±1 standard devi- 
ation uncertainty on the background. The expected signal is 
shown using the measured cross section. 

(lepton flavor, jet multiplicity, and tag multiplicity) of 
the decision tree discriminant, separately for the tb+tqb, 
tqb, and tb analyses. We assume a Poisson distribution 
for the observed counts and flat nonncgative prior prob- 
abilities for the signal cross sections. Systematic uncer- 
tainties and their correlations are taken into account 
by integrating over the signal acceptances, background 
yields, and integrated luminosity with Gaussian priors for 
each systematic uncertainty. The final posterior proba- 
bility density is computed as a function of the production 
cross section. For each analysis, we measure the cross 
section using the position of the posterior density peak 
and we take the 68% asymmetric interval about the peak 
as the uncertainty on the measurement. 

We test the validity of the cross section measure- 
ment procedure using six ensembles of pscudo-datasets 
selected from the full set of tb+tqb signal and background 
events weighted to represent their expected proportions. 
A Poisson distribution with a mean equal to the total 
number of selected events is randomly sampled to deter- 
mine the number of events in each pseudo-dataset. Each 
ensemble has a different assumed tb+tqb cross section 
between 2 pb and 8 pb. No significant bias is seen in the 
mean of the measured cross sections for these ensembles. 

The expected SM and measured posterior probability 
densities for tb+tqb are shown in Fig. [3l We use the 
measured posterior density distribution for tb+tqb as 
shown in Fig. [3] and similar distributions for tqb and tb to 
make the following measurements: o(j>p — > tb + X, tqb + 
X) = 4.9 ± 1.4 pb, a(pp -> tqb + X) = 4.2±^| pb, and 
a(pp — » tb + X) = 1.0 ± 0.9 pb. These results are consis- 



tent with the SM expectations. The uncertainties include 
statistical and systematic components combined. The 
data statistics contribute 1.2 pb to the total 1.4 pb uncer- 
tainty on the tb+tqb cross section. 




23456789 10 
tb+tqb Cross Section [pb] 

FIG. 3: Expected SM and measured Bayesian posterior prob- 
ability densities for the tb+tqb cross section. The shaded 
regions indicate one standard deviation above and below the 
peak positions. 

We assess how strongly this analysis rules out (or is 
expected to rule out) the background-only hypothesis by 
measuring the probability for the background to fluc- 
tuate up to give the measured (or SM) value of the 
tb+tqb cross section or greater. From an ensemble of 
over 68,000 background-only pseudo-datasets, with all 
systematic uncertainties included, we find that the back- 
ground fluctuates up to give the SM cross section of 
2.9 pb or greater 1.9% of the time, corresponding to 
an expected significance of 2.1 standard deviations (SD) 
for a Gaussian distribution. The probability that the 
background fluctuates up to produce the measured cross 
section of 4.9 pb or greater is 0.035%, corresponding to 
a significance for our result of 3.4 SD. Using a second 
ensemble of pseudo-datasets which includes a SM tb+tqb 
signal with 2.9 pb cross section, with all systematic uncer- 
tainties included, we find the probability to measure a 
cross section of at least 4.9 pb to be 11%. 

We apply two alternative methods to calculate tb+tqb 
discriminants. The first technique calculates the prob- 
ability for each event to be signal or background based 
on the leading-order matrix element des crip tion of each 
process for two-jet and three-jet events [29(. It takes as 
input the four-momenta of the reconstructed objects and 
incorporates the 6-tagging information for each event. 
This is a powerful method to extract the small signal 
because it encodes the kinematic information of the 
signal and background processes at the parton level. The 
probability that the background fluctuates up to give the 
SM cross section or greater in the matrix element analysis 
is 3.7% (1.8 SD). We measure a(pp -> tb + X, tqb + X) = 
4.6^5 pb. The probability for the background to fluc- 
tuate up to give a cross section of at least 4.6 pb is 0.21% 
(2.9 SD). The second alternative method uses Bayesian 
neural networks [3(| to separate tb+tqb signal from back- 
ground. We train the networks separately for each anal- 
ysis channel on a sample of signal events and on an equal- 
sized sample of background events containing the back- 
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2 and a pure V— A and 
Assuming in addition 



ground components in their expected proportions, using 
24 input variables (a subset of the 49 used in the boosted 
decision tree analysis). Large numbers of networks are 
averaged, resulting in better separation than can be 
achieved with a single network. The probability that the 
background fluctuates up to give the SM cross section 
or greater in the Bayesian neural network analysis is 
9.7% (1.3 SD). We measure a(pp -^tb + X, tqb + X) = 
5.0 ± 1.9 pb. The probability for the background to fluc- 
tuate up to give a cross section of at least 5.0 pb is 0.89% 
(2.4 SD). 

The three analyses are correlated since they use the 
same signal and background models and data, with 
almost the same systematic uncertainties. We take the 
decision tree measurement as our main result because 
this method has the lowest a priori probability for the 
background to have fluctuated up to give the SM cross 
section or greater. That is, we expect the decision tree 
analysis to rule out the background-only hypothesis with 
greatest significance. 

We use the decision tree measurement of the tb+tqb 
cross section to derive a first direct measurement of the 
strength of the V— A coupling \ V t bfi \ in the Wtb vertex, 
where /f is an arbitrary left-handed form factor 31 1. 
We measure \Vtbfi \ = 1.3 ± 0.2. This measurement 
assumes \V td \ 2 + \V ts \ 2 < \V tb \ 
CP-conserving Wtb interaction 
that fi — 1 and using a flat prior for |Vjb| 2 from to 1, 
we obtain 0.68 < \V t b\ < 1 at 95% C.L. These measure- 
ments make no assumptions about the number of quark 
families or CKM matrix unitarity. 

To summarize, we have performed a search for single 
top quark production using 0.9 fb _1 of data collected 
by the DO experiment at the Tevatron collider. We find 
an excess of events over the background prediction in 
the high discriminant output region and interpret it as 
evidence for single top quark production. The excess has 
a significance of 3.4 standard deviations. We use the 
boosted decision tree discriminant output distributions 
to make the first measurement of the single top quark 
cross section: a(pp —ttb + X, tqb + X) = 4.9 ± 1.4 pb. 
We use this cross section measurement to make the 
first direct measurement of the CKM matrix element 
\Vtb\ without assuming CKM matrix unitarity, and find 
0.68 < \V tb \ < 1 at 95% C.L. 
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